Use Pixi for dependency management #4

jayqi · 2024-07-29T23:50:53Z

Sets up dependency management with Pixi instead of conda-lock / conda, including both generating lock files and managing the environment. Also prevents logging outside of smoke tests (resolves issue 83)

Generating lockfiles

The lockfile is now generated within a docker container, because of the known Pixi limitation that we can't solve cross platform pypi dependencies. To generate runtime/pixi.lock, the command make update-lockfile:

Builds a docker image based on Dockerfile-lock. This new dockerfile just runs the command to generate pixi.lock. It doesn't install any dependencies or run the submission.
Uses docker create to create a dummy container from the image without running it
Copies pixi.lock from the dummy container back to the host
Deletes the dummy container

Having a separate Dockerfile allows us to update the lockfile more quickly. If we used the existing Dockerfile, the full submission would run every time we need to update the lockfile.

make update-lockfiles runs from scratch (with no existing pixi.lock) in about 2 minutes. I also tested that it works with some pypi packages included.

Outstanding

Check whether we still need test_lockfile.py -- try and install a package in both conda and pip, see if pixi lets us --> pixi resolves this for us, we don't need test_lockfile.py
- The previous test_lockfile.py just checks whether there are both conda and pip versions of the same package.
- It's possible that Pixi automatically resolves pip dependencies with conda dependencies in a way that conda-lock does not. (pixi list shows one entry for each package that has been installed. That one entry is either installed from pypi or conda.) Scrap work in this commit
Make sure entrypoint.sh runs python main.py in the correct environment
- One change from conda lock is that we use pixi run to run main.py. This means we also need to specify which pixi environment with CPU_OR_GPU, so we set CPU_OR_GPU as an environment variable in the dockerfile
Update README (tracked in separate project issue)

Background about Pixi from JQ

Pixi is the new thing in the Conda ecosystem that has been gaining momentum for a while. It's made by the team that makes Mamba. They claim to be production-ready.

What benefits does Pixi have?

Lockfiles are a primary part of their default workflow. There's a lot of focus on doing it well.
The UX is pretty nice. Conflict messages are pretty clear.
It's fast because it's written in Rust.
It does minimal updates to lockfiles, i.e., if you change something and existing versions in your lockfile already satisfy the constraint, it won't change them. conda-lock does not do this.

Some notes on implementation:

I've shoved common dependencies into ~~the default environment~~ a "base" feature that gets inherited by cpu and gpu. ~~The default environment on its own shouldn't be used though.~~
I saw that PyTorch has a new thing where they have pytorch-cuda and cpuonly metapackages to help you pin, so I'm using those. It's how they do it in their official docs.

Running it yourself:

Install Pixi. I have it with brew install pixi on macOS.
To run the locking, you can run pixi ls or pixi tree which are basically no-op commands normally but will trigger it to check the lockfile for freshness.
- If nothing happens, it means the lockfile satisfies current constraints.
- If you want to force it to rerun, rm pixi.lock first.

Some helpful references:

Also, I saw the conda-lock maintainers are considering endorsing Pixi as a better default solution so that feels like more of a turnoff from using conda-lock.

klwetstone

@jayqi This is excellent. 👏 👏 An enormous thank you for digging into this and finding a much more efficient solution!!

Poking around, I agree and like the idea of switching to Pixi. It's faster and easy to work with, a really great suggestion.

Pixi is a full environment management tool. It supports Conda and PyPI packages but is not interoperable with Conda environments. That means we'll need to update our Dockerfile and entrypoint.sh script to use Pixi. I think this is probably fine: I expect pixi install -e gpu and pixi run -e gpu should work.

Before officially switching over, I'd like to poke around with making these updates to check if we run into any additional issues.

pixi.toml

jayqi · 2024-07-30T15:30:22Z

@klwetstone it'll also be good to throw in some PyPI packages just to make sure that works as expected.

jayqi · 2024-07-30T15:30:59Z

@klwetstone you should just take over this branch and push any commits you think make sense.

klwetstone · 2024-07-30T19:25:05Z

Documenting steps updating docker container:

updated all commands
Installing using pixi leads to an error:

(base) root@02b4239deb18:/tmp# pixi install --manifest-path /tmp/pixi.toml -e gpu
⠴ creating environment 'gpu'
⠤ creating environment 'gpu'
    download & extract   [00:06:10] [━━━━━━━━━━━━━━━━━━━━] 7.89 GiB @ 21.83 MiB/s pytorch
    installing packages  [00:06:10] [━━━━━━━━━━━━━━━━━━━━]   294/294                                                                            × failed to fetch pytorch-2.1.1-py3.10_cuda11.8_cudnn8.7.0_0.tar.bz2
  ├─▶ an io error occurred
  ├─▶ failed to unpack `/root/.cache/rattler/cache/pkgs/pytorch-2.1.1-py3.10_cuda11.8_cudnn8.7.0_0/info/files`
  ├─▶ failed to unpack `info/files` into `/root/.cache/rattler/cache/pkgs/pytorch-2.1.1-py3.10_cuda11.8_cudnn8.7.0_0/info/files`
  ╰─▶ No space left on device (os error 28)

We may need to change the base image from mambaorg/micromamba:1.5.3-bookworm-slim. We can have a slimmed down base image, then we may have space to install and use pixi

jayqi · 2024-07-30T21:21:39Z

@klwetstone Yeah, we don't need the Micromamba image because Pixi is entirely an alternative to Micromamba.

Looks like there are official Pixi Docker images: https://github.com/prefix-dev/pixi-docker

There's a bookworm-slim-based one and also Nvidia-based ones.

jayqi · 2024-07-30T21:23:26Z

(In case you didn't know, bookworm refers to a particular version of Debian, which is a Linux distribution, and slim is just the lightweight version of it.)

klwetstone · 2024-07-30T21:25:07Z

So weird -- I searched docker hub and that one didn't come up. I ended up making it worth using nvidia/cuda:11.8.0-base-ubuntu22.04 as the base image and just installing pixi (current dockerfile on this branch).

jayqi · 2024-07-30T21:27:03Z

They're not on Docker Hub, they're on a GitHub Container Registry repository.

jayqi · 2024-07-30T21:31:11Z

@klwetstone I'd recommend using the images they already publish. https://github.com/prefix-dev/pixi-docker/pkgs/container/pixi

Firstly, you won't need to muck with installing Pixi. Secondly, that lets you easily pin to a particular version of Pixi, instead of installing the latest version each time that you build.

klwetstone · 2024-07-30T21:34:23Z

Agreed -- thank you! I didn't think to search Github container registry in addition to docker hub

jayqi · 2024-07-30T21:38:41Z

Agreed -- thank you! I didn't think to search Github container registry in addition to docker hub

The thing to do is just here is to just google "whatever docker image" instead of searching on Docker Hub. Docker Hub is a commercial service that costs money so open source projects won't always be there.

klwetstone · 2024-07-31T15:16:23Z

@klwetstone it'll also be good to throw in some PyPI packages just to make sure that works as expected.

@jayqi I ran into some interesting snags here -- I'm continuing to work on debugging, curious if you have any initial thoughts about the right directions to investigate. When I add in pypi dependencies, pixi only runs in the docker container and errors on my Mac. I haven't fully confirmed, but it doesn't seem like this happens with conda-lock -- I'm not finding great documentation on what conda-lock actually does under the hood, so I'm not sure.

Some hypothesis I have for what might be going on here:

There's some other difference between pixi and conda-lock in their ability to check dependencies for a platform different than the current machine. Pixi has a bunch of commands to run things in the specified environment, so I wonder whether it always expects the current machine to be the same as the specified platform.
There could also be some snags in pixi's support of pypi.

I added to pixi.toml:

[feature.cpu.pypi-dependencies]
pytest = {version = "*"}
chromadb = { version = "*" }
sacremoses = { version = "*" }

Error when I run on my mac:

$ pixi tree --manifest-path runtime/pixi.toml --platform linux-64
  × Unable to solve pypi dependencies for the cpu environment because no compatible python interpreter can be installed for the current platform
   ╭─[4:13]
 3 │ channels = ["nvidia", "conda-forge", "pytorch", "xformers"]
 4 │ platforms = ["linux-64"]
   ·             ──────┬─────
   ·                   ╰── even though the projects does include support for 'osx-64'
 5 │ 
   ╰────
  help: Try converting your [pypi-dependencies] to conda [dependencies]

(the above runs fine in the docker container)

jayqi · 2024-07-31T15:50:50Z

Okay, quick fix is that I think you're going to have to add osx-64 to the platforms. We're using Pixi in a way that I think isn't 100% intended, so this complaint feels like something that isn't necessary but is just how it's set up right now.

jayqi · 2024-07-31T16:47:22Z

@klwetstone alternative idea: does it make sense to set up a Docker entrypoint that runs the locking inside a container? That way, we can still run the locking without needing to add macOS or Windows to the platforms. (While this PyPI kludge is a thing.) One possible other benefit there is that someone can run the locking without needing to install Pixi.

jayqi · 2024-07-31T17:03:31Z

Okay this is a known limitation in Pixi right now: prefix-dev/pixi#1130

Basically, because of the way Python packages work (packages can run arbitrary Python code to set their package metadata), Pixi needs a Python interpreter to resolve the PyPI packages. Because your OS (osx-64) is not listed in the platforms, Pixi is unable to install a Python that fits the requirements. By my understanding, this is a technically correct thing. I think it's possible they come up with something smart that can work around this in the future, or they relax some assumptions (I think conda-lock and pip-compile make assumptions that platform and/or Python versions don't matter), but for now this is a limitation.

The two approaches here would be:

Add osx-64 (and maybe Windows if we want Windows participants to be able to run locking) to the platforms. This means that it'll make Pixi try to resolve stuff that is able to work on all 2 or 3 platforms unnecessarily (from the POV that we only care about Linux for the code execution).
Run locking in a Linux Docker container.

klwetstone · 2024-08-01T16:16:19Z

Thank you this is extremely helpful!!

Add osx-64 (and maybe Windows if we want Windows participants to be able to run locking) to the platforms. (per your suggestion)

I don't think this will work with our GPU environment. When I add osx-64, I'm getting:

  × failed to solve the conda requirements of 'gpu' 'osx-64'
  ╰─▶ Cannot solve the request because of: No candidates were found for cudatoolkit
      ==11.8.

This makes sense, because we can't install cudatoolkit on a mac.

I also see an option 3, which is to stick with conda-lock for now and get it to run more efficiently by significantly simplify the requirements (a la this comment). I'm going to see if I can get option 3 working for now, mainly in the interest of time so I can test some different node sizes.

I do think option 2 (using pixi) is the better long term solution, because conda-lock might just get extremely slow again once participants start adding packages. We could have another Dockerfile that just updates and checks the lock files, and run that dockerfile with make update-lockfiles.

jayqi · 2024-08-01T16:23:11Z

@klwetstone FWIW, conda-lock being slow like what you experienced is not normal behavior, and should either be considered a bug or a pathological edge case that should be addressed rather than lived with.

klwetstone · 2024-08-01T16:32:13Z

Thanks! For my understanding, by "pathological edge case" do you mean it's more than just unresolvable dependencies -- Ie. It may be related to my installation of conda, and not just something that can be solved by loosening dependencies? Either way, I think it's a good idea to test out running conda-lock on some super-basic yamls to check that it runs in a reasonable time.

jayqi · 2024-08-02T14:48:17Z

"related to my installation of conda" = bug

"pathological edge case" = your specific set of dependencies and versions and the solver interact in a specific way where the algorithm works in an especially inefficient way. https://en.wikipedia.org/wiki/Worst-case_complexity

klwetstone · 2024-08-06T18:01:00Z

@jayqi I think I have everything set up to use pixi instead of conda-lock, do you have time to review and make sure the new approach makes sense? Details are in the updated PR description

* Fix Dockerfiles * Fix permissions and directories and stuff * Update test command * Fix command ordering * Use clean cache command * Add maximize build space action * Add more root reserve space * Remove unwanted software directly * Add some diagnostics * Print out pixi info * Fix typo --------- Co-authored-by: Jay Qi <[email protected]>

jayqi · 2024-08-09T16:00:20Z

There's still something weird and horrible going on with the GPU image. I've spent too much time on this already, but just some loose ideas:

Test is failing:

   × The platform you are running on should at least have the virtual package
  │ __cuda on version 11.8, build_string: 0

What is this? This error comes from Pixi, and I think __cuda is a Pixi thing: it's a "virtual packages" that Pixi uses to track whether CUDA stuff got installed or not. If you run pixi info (I do this in the Docker build command, you can see it in the logs here) it'll list out metapackages.

In theory, if we have CUDA installed correctly, Pixi should "detect" this as an available virtual package. If you google about this, there have been bugs in the past but nothing obviously still open that is relevant.

In general, our GPU stuff is kind of brittle.

We're mixing conda-forge and nvidia channel packages which I think may lead to weird things
- It feels like, from general googling, that people would like to prioritize versions in the nvidia channel over conda-forge (since Nvidia maintains those)
This whole ecosystem with conda-forge, nvidia, and pytorch channels seems really confusing and brittle. They also keep changing what metapackages they're using to pin versions so things end up with very brittle compatibility.
- For example, it seems like cudatoolkit is maybe outdated but also some things like tensorflow-gpu seem to depend on it.
Tensorflow via conda-forge gives me less confidence because I don't think Tensorflow maintainers actually maintain it. We might just want to install it with PyPI because that's their official instruction, if tensorflow causes problems.

r-b-g-b

Looks great!

I can re-run make update-lockfile and nothing changes
If I delete pixi.lock it regenerates without error (nicely, regenerating from scratch will result in a different pixi.lock, proving that running on top of an existing pixi.lock doesn't change anything it doesn't need to)
It's fast!

One suggested change, on principle more than anything else. Otherwise, looks good to me!

runtime/Dockerfile-lock

klwetstone · 2024-08-15T18:39:04Z

@r-b-g-b ready for a final look!

r-b-g-b

Smallest nit! Looks really good!

runtime/Dockerfile-lock

jayqi added 3 commits July 29, 2024 19:06

Use pixi for dependency management

06a6407

Fix tensorflow

178364b

Move shared dependencies into a base feature

bef25dc

klwetstone reviewed Jul 30, 2024

View reviewed changes

pixi.toml Outdated Show resolved Hide resolved

klwetstone added 4 commits July 30, 2024 15:38

move pixi files

c24358d

remove conda-lock files

e897d08

dockerfile scrap work

f27fc7d

update dockerfile

c355861

use pixi docker base image

d1c728d

klwetstone marked this pull request as draft July 30, 2024 22:05

work updating test_lockfile

bbd31dc

update make update-lockfiles for pixi

b2f40fb

klwetstone added 4 commits August 6, 2024 13:36

update lockfile

33e22bb

update makefile

4a83b05

remove test_lockfile.py adapted for pixi

ea9b46a

makefile logging

c2290a5

klwetstone marked this pull request as ready for review August 6, 2024 17:53

jayqi mentioned this pull request Aug 9, 2024

[Into Pixi PR #4] Fix Dockerfiles #5

Merged

jayqi and others added 5 commits August 9, 2024 15:47

Use a cuda base image

8b97f0f

Hardcode python executable for test

07594a4

Remove extra Python

8188f71

pass CPU_OR_GPU to entrypoint.sh

925ac16

log cpu or gpu in entrypoint

5ec5e64

r-b-g-b approved these changes Aug 14, 2024

View reviewed changes

runtime/Dockerfile-lock Outdated Show resolved Hide resolved

klwetstone added 6 commits August 14, 2024 14:06

use jammy for locking

3d6e80e

test pixi.lock update

91ddba1

hacky pixi dependency fix

aa6c968

revert pixi.lock

f1adf43

only log in smoke tests

6070c3e

fix mounting for lockfile generation

9d8dfec

klwetstone added 2 commits August 15, 2024 17:19

revert makefile changes for testing

a52df64

fix comment

ef2d0ba

r-b-g-b approved these changes Aug 15, 2024

View reviewed changes

runtime/Dockerfile-lock Outdated Show resolved Hide resolved

omit trailing slash

bd39cee

klwetstone merged commit 9824c1f into main Aug 16, 2024
2 checks passed

klwetstone deleted the jyq-pixi branch August 16, 2024 13:22

klwetstone restored the jyq-pixi branch August 16, 2024 15:48

klwetstone deleted the jyq-pixi branch August 16, 2024 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pixi for dependency management #4

Use Pixi for dependency management #4

jayqi commented Jul 29, 2024 •

edited by klwetstone

Loading

klwetstone left a comment

jayqi commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024

jayqi commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024 •

edited

Loading

jayqi commented Jul 30, 2024 •

edited

Loading

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 31, 2024

jayqi commented Jul 31, 2024

jayqi commented Jul 31, 2024

jayqi commented Jul 31, 2024

klwetstone commented Aug 1, 2024 •

edited

Loading

jayqi commented Aug 1, 2024

klwetstone commented Aug 1, 2024

jayqi commented Aug 2, 2024 •

edited

Loading

klwetstone commented Aug 6, 2024

jayqi commented Aug 9, 2024 •

edited

Loading

r-b-g-b left a comment

klwetstone commented Aug 15, 2024

r-b-g-b left a comment

Use Pixi for dependency management #4

Use Pixi for dependency management #4

Conversation

jayqi commented Jul 29, 2024 • edited by klwetstone Loading

Background about Pixi from JQ

klwetstone left a comment

Choose a reason for hiding this comment

jayqi commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024

jayqi commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024 • edited Loading

jayqi commented Jul 30, 2024 • edited Loading

jayqi commented Jul 30, 2024

klwetstone commented Jul 30, 2024

jayqi commented Jul 30, 2024

klwetstone commented Jul 31, 2024

jayqi commented Jul 31, 2024

jayqi commented Jul 31, 2024

jayqi commented Jul 31, 2024

klwetstone commented Aug 1, 2024 • edited Loading

jayqi commented Aug 1, 2024

klwetstone commented Aug 1, 2024

jayqi commented Aug 2, 2024 • edited Loading

klwetstone commented Aug 6, 2024

jayqi commented Aug 9, 2024 • edited Loading

r-b-g-b left a comment

Choose a reason for hiding this comment

klwetstone commented Aug 15, 2024

r-b-g-b left a comment

Choose a reason for hiding this comment

jayqi commented Jul 29, 2024 •

edited by klwetstone

Loading

klwetstone commented Jul 30, 2024 •

edited

Loading

jayqi commented Jul 30, 2024 •

edited

Loading

klwetstone commented Aug 1, 2024 •

edited

Loading

jayqi commented Aug 2, 2024 •

edited

Loading

jayqi commented Aug 9, 2024 •

edited

Loading